fix: [Bug] Pipeline parameters (ParameterInteger, ParameterString) fail in ModelTrain (5504) by aviruthen · Pull Request #28 · aviruthen/sagemaker-python-sdk

aviruthen · 2026-03-26T23:09:50Z

Description

PipelineVariable Support in ModelTrainer Fields (GH#5524)

This PR ensures that ModelTrainer fields that accept StrPipeVar (Union of str and PipelineVariable) work correctly when PipelineVariable objects (e.g., ParameterString) are passed.

Changes

sagemaker-train/src/sagemaker/train/utils.py: Updated _get_repo_name_from_image to handle PipelineVariable objects gracefully by returning a default name instead of attempting string operations on non-string types.
sagemaker-train/src/sagemaker/train/model_trainer.py: Updated _validate_training_image_and_algorithm_name to properly detect PipelineVariable instances as truthy values during validation, since PipelineVariable objects may not support standard boolean coercion.

Testing

Verified with unit tests in test_model_trainer_pipeline_variable.py that:

training_image, algorithm_name, training_input_mode accept ParameterString
environment dict values accept ParameterString
Plain string values continue to work (regression tests)
Invalid types (e.g., int) are still rejected

Related Issue

Related issue: 5504

Changes Made

No response from agent

AI-Generated PR

This PR was automatically generated by the PySDK Issue Agent.

Confidence score: 0%
Classification: bug
SDK version target: V3

Merge Checklist

Changes are backward compatible
Commit message follows prefix: description format
Unit tests added/updated
Integration tests added (if applicable)
Documentation updated (if applicable)

…il in ModelTrain (5504)

aviruthen

🤖 AI Code Review

This PR fixes PipelineVariable support in ModelTrainer but has several issues: removed type annotations without replacement, a magic string constant, and the test file referenced in the description is not included in the diff. The approach is reasonable but needs refinement.

aviruthen · 2026-03-26T23:10:42Z


    def _validate_training_image_and_algorithm_name(
-        self, training_image: Optional[str], algorithm_name: Optional[str]
+        self, training_image, algorithm_name


Missing type annotations: The type annotations for training_image and algorithm_name were removed entirely. Per SDK coding standards (PEP 484), all public/private methods must retain type annotations. Since these parameters now accept both str and PipelineVariable, please use the appropriate union type:

def _validate_training_image_and_algorithm_name( self, training_image: str | PipelineVariable | None, algorithm_name: str | PipelineVariable | None ):

Or if StrPipeVar is already defined as a type alias in the codebase, use that.

aviruthen · 2026-03-26T23:10:42Z


    Args:
-        image (str): The image URI
+        image: The image URI (str or PipelineVariable)


Removed type annotation: Same issue here — the image parameter lost its type annotation. Please restore it with the correct union type:

def _get_repo_name_from_image(image: str | PipelineVariable) -> str:

aviruthen · 2026-03-26T23:10:42Z


    def _validate_training_image_and_algorithm_name(
-        self, training_image: Optional[str], algorithm_name: Optional[str]
+        self, training_image, algorithm_name


Missing test file in the diff: The PR description references test_model_trainer_pipeline_variable.py but this file is not included in the changed files. Please ensure the test file is included in the PR. Without tests, we cannot verify the fix works or guard against regressions.

aviruthen

🤖 AI Code Review

This PR fixes PipelineVariable support in ModelTrainer by handling non-string PipelineVariable objects in validation and utility functions. The approach is reasonable, but there are some issues: a line exceeds 100 characters, test fixtures have significant duplication that should be extracted, and the from __future__ import annotations import is missing in favor of the older from __future__ import absolute_import.

aviruthen · 2026-03-26T23:17:34Z

+        mock_session = MagicMock()
+        mock_session.boto_region_name = "us-east-1"
+        mock_session.default_bucket.return_value = "my-bucket"
+        mock_session.default_bucket_prefix = None


The mock session and mock_defaults setup is duplicated across 4 test methods (test_training_image_accepts_parameter_string, test_algorithm_name_accepts_parameter_string, test_environment_values_accept_parameter_string, test_plain_string_values_still_work). Extract this into a @pytest.fixture to reduce duplication and improve maintainability:

@pytest.fixture def mock_session(): session = MagicMock() session.boto_region_name = "us-east-1" session.default_bucket.return_value = "my-bucket" session.default_bucket_prefix = None return session

And similarly for the mock_defaults patching.

aviruthen · 2026-03-26T23:17:34Z

+        mock_defaults.get_role.return_value = "arn:aws:iam::123456789012:role/SageMakerRole"
+        mock_defaults.get_base_job_name.return_value = "test-job"
+        mock_defaults.get_compute.return_value = Compute(
+            instance_type="ml.m5.xlarge", instance_count=1


Consider adding a test case for _validate_training_image_and_algorithm_name where one argument is a PipelineVariable and the other is None — this is the primary success case the fix enables. The current tests test_training_image_accepts_parameter_string and test_algorithm_name_accepts_parameter_string test this indirectly through full ModelTrainer construction, but a direct unit test of the validation method (like the rejection tests at lines 199-238) would be more focused and faster.

aviruthen

🤖 AI Code Review

This PR fixes PipelineVariable support in ModelTrainer by handling PipelineVariable objects that don't support standard boolean coercion. The approach is reasonable, but there are a few issues: a duplicate import in utils.py, the validation logic could be simplified, and the test file has imports inside test methods rather than at module level.

aviruthen · 2026-03-26T23:21:32Z

 from datetime import datetime
 from typing import Literal, Any

+from typing import Union


Duplicate import: Union is imported here from typing, but there's already a from typing import Literal, Any on line 24. Consolidate into a single import statement:

from typing import Literal, Any, Union

Also, since the module already imports PipelineVariable from sagemaker.core.workflow.parameters on line 30, and from __future__ import annotations is not present, consider adding it to enable PEP 604 union syntax (str | PipelineVariable) per SDK conventions.

aviruthen · 2026-03-26T23:21:32Z

+        # PipelineVariable objects do not support standard boolean coercion
+        # (__bool__ raises TypeError), so we use isinstance checks to detect
+        # them as truthy values during validation.
+        has_image = isinstance(training_image, PipelineVariable) or bool(training_image)


The logic isinstance(training_image, PipelineVariable) or bool(training_image) will raise TypeError if training_image is a PipelineVariable that doesn't support __bool__ — but since isinstance short-circuits via or, this is actually safe. However, consider simplifying to:

has_image = training_image is not None and training_image != "" has_algo = algorithm_name is not None and algorithm_name != ""

This avoids calling bool() entirely and is more explicit about what "not provided" means (None or empty string). The is not None check naturally handles PipelineVariable objects correctly.

aviruthen · 2026-03-26T23:21:32Z



-def _get_repo_name_from_image(image: str) -> str:
+def _get_repo_name_from_image(image: Union[str, PipelineVariable]) -> str:


The return type annotation says -> str but when a PipelineVariable is passed, it returns a string placeholder, so the annotation is technically correct. However, consider documenting in the docstring that the placeholder _PIPELINE_VARIABLE_IMAGE_PLACEHOLDER is returned for PipelineVariable inputs, so downstream callers understand the behavior.

aviruthen · 2026-03-26T23:21:32Z

+        from sagemaker.train.model_trainer import ModelTrainer
+        from sagemaker.train.configs import Compute
+
+        param = ParameterString(


The from sagemaker.train.model_trainer import ModelTrainer import is repeated inside every test method in this class. Move it to the top of the file with the other imports. Inline imports in tests add unnecessary noise and are not consistent with SDK test conventions.

aviruthen · 2026-03-26T23:21:32Z

+_TEST_IMAGE_URI = (
+    "683313688378.dkr.ecr.us-east-1.amazonaws.com/"
+    "sagemaker-xgboost:1.0-1-cpu-py3"
+)


Nit: This test image URI contains a hardcoded region (us-east-1) and account ID (683313688378). While this is acceptable for unit tests since it's just a string constant and not used to make actual API calls, consider using a clearly fake account ID (e.g., 123456789012) for consistency with the mock session fixture below.

aviruthen · 2026-03-26T23:21:32Z

+        assert trainer.training_image == _TEST_IMAGE_URI
+
+    def test_validation_accepts_pipeline_variable_image_none_algo(self):
+        """Test validation accepts PipelineVariable image with None algorithm."""


Using ModelTrainer.__new__(ModelTrainer) to bypass __init__ and directly test the validation method is fragile — it creates an uninitialized object. If _validate_training_image_and_algorithm_name ever accesses self attributes, these tests will break with confusing errors. Consider either:

Making _validate_training_image_and_algorithm_name a @staticmethod (it doesn't use self), or

Using the existing mock_train_defaults fixture to construct a proper instance and test through the public interface.

aviruthen · 2026-03-26T23:21:32Z

+
+class TestSafeSerializeWithPipelineVariable:
+    """Tests for safe_serialize handling of PipelineVariable objects."""
+


The TestSafeSerializeWithPipelineVariable tests verify safe_serialize behavior with PipelineVariable, but the PR diff doesn't show any changes to safe_serialize. If safe_serialize already handled PipelineVariable correctly, these tests are documenting existing behavior (which is fine), but it would be good to note that in the test class docstring. If safe_serialize needed changes, those changes should be included in this PR.

fix: [Bug] Pipeline parameters (ParameterInteger, ParameterString) fa…

9450aee

…il in ModelTrain (5504)

aviruthen commented Mar 26, 2026

View reviewed changes

fix: address review comments (iteration #1)

a996ac7

aviruthen commented Mar 26, 2026

View reviewed changes

fix: address review comments (iteration #2)

471ee99

aviruthen commented Mar 26, 2026

View reviewed changes

aviruthen closed this Mar 26, 2026

aviruthen deleted the fix/bug-pipeline-parameters-parameterinteger-5504 branch March 26, 2026 23:22



		def _get_repo_name_from_image(image: str) -> str:
		def _get_repo_name_from_image(image: Union[str, PipelineVariable]) -> str:


		class TestSafeSerializeWithPipelineVariable:
		"""Tests for safe_serialize handling of PipelineVariable objects."""

Conversation

aviruthen commented Mar 26, 2026

Description

PipelineVariable Support in ModelTrainer Fields (GH#5524)

Changes

Testing

Related Issue

Changes Made

AI-Generated PR

Merge Checklist

Uh oh!

aviruthen left a comment

Choose a reason for hiding this comment

🤖 AI Code Review

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aviruthen left a comment

Choose a reason for hiding this comment

🤖 AI Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

aviruthen left a comment

Choose a reason for hiding this comment

🤖 AI Code Review

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant